A Flexible Convex Optimization Model for Semi-supervised Clustering with Instance-level Constraints

نویسندگان

  • Xianwen Ren
  • Yong Wang
  • Xiang-Sun Zhang
چکیده

Clustering is a common task in many applications e.g. digital image processing, text mining and bioinformatics. Many techniques such as k-means, hierarchical clustering and spectral clustering, have been proposed. In a previous study, we proposed a quadratic programming model to address the fuzzy binary clustering problem in the unsupervised setting and then extended it to the general clustering problem. In this paper, we extend further the model in the semi-supervised setting. It has three salient characteristics. First, both the label and link information of known samples can be integrated easily. Second, it illustrates the linkage between the hard binary clustering and fuzzy binary clustering in one framework, suggesting the benefits of fuzzy binary clustering theoretically. Third, a fast iterative algorithm is proposed, which can be applied to very large data sets. Numerical experiments on two data sets suggest its practical effectiveness and efficiency.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Effective Semi-Supervised Clustering Framework Integrating Pairwise Constraints and Attribute Preferences

Both the instance level knowledge and the attribute level knowledge can improve clustering quality, but how to effectively utilize both of them is an essential problem to solve. This paper proposes a wrapper framework for semi-supervised clustering, which aims to gracely integrate both kinds of priori knowledge in the 598 J. L. Wang, S.Y. Wu, C. Wen, G. Li clustering process, the instance level...

متن کامل

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...

متن کامل

On the Comparison of Semi-Supervised Hierarchical Clustering Algorithms in Text Mining Tasks

Semi-supervised clustering approaches have emerged as an option for enhancing clustering results. These algorithms use external information to guide the clustering process. In particular, semi-supervised hierarchical clustering approaches have been explored in many fields in the last years. These algorithms provide efficient and personalized hierarchical overviews of datasets. To the best of th...

متن کامل

Semi-supervised Clustering by Input Pattern Assisted Pairwise Similarity Matrix Completion

Many semi-supervised clustering algorithms have been proposed to improve the clustering accuracy by effectively exploring the available side information that is usually in the form of pairwise constraints. However, there are two main shortcomings of the existing semi-supervised clustering algorithms. First, they have to deal with non-convex optimization problems, leading to clustering results t...

متن کامل

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011